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Abstract 

There is a recent surge of interest in identifying the sharp recovery thresholds for cluster 
recovery under the stochastic block model. In this paper, we address the more refined question 
of how many vertices that will be misclassified on average. We consider the binary form of the 
stochastic block model, where n vertices are partitioned into two clusters with edge probability 
a/n within the first cluster, cjn within the second cluster, and b/n across clusters. Suppose that 
as n —> oo, a = 6 + c = 6 + i/\/b for two fixed constants /x, z/, and 6 —> oo with b = 

When the cluster sizes are balanced and fj, v, we show that the minimum fraction of misclassi¬ 
fied vertices on average is given by where Q{x) is the Q-function for standard normal, 

V* is the unique fixed point of x; = ^ —h E[tanh(n -|- ^yvZ)], and Z is standard nor¬ 

mal. Moreover, the minimum misclassified fraction on average is attained by a local algorithm, 
namely belief propagation, in time linear in the number of edges. Our proof techniques are based 
on connecting the cluster recovery problem to tree reconstruction problems, and analyzing the 
density evolution of belief propagation on trees with Gaussian approximations. 


1 Introduction 

The problem of cluster recovery under the stochastic block model has been intensely studied in 
statistics [24, 44, 6, 8, 47, 19], computer science (where it is known as the planted partition prob¬ 
lem) [17, 25, 13, 31, 11, 12, 9, 4, 10], and theoretical statistical physics [14, 48, 15]. In the simplest 
binary form, the stochastic block model assumes that n vertices are partitioned into two clusters 
with edge probability a/n within the first cluster, c/n within the second cluster, and b/n across 
the two clusters. The goal is to reconstruct the underlying clusters from the observation of the 
graph. Different reconstruction goals can be considered depending on how the model parameters 
a, b, c scale with n (See [2] for more discussions): 

• Exact recovery (strong consistency). If the average degree is D(logn), it is possible to exactly 
recover the clusters (up to a permutation of cluster indices) with high probability. In the 
case with two equal-sized clusters, and a = c = alogn/n and b = /31ogn/n for two fixed 
a, P > 0, a sharp exact recovery threshold {^/a — > 2 has been found in [39, 1] and it 

is further shown that semi-definite programming can achieve the sharp threshold in [20, 5]. 
The threshold for two unequal-sized clusters is proved in [21]. Exact recovery threshold with 
a fixed number of clusters has been identified in [21, 46, 3], and more generally in [2, 42] with 
heterogeneous cluster sizes and edge probabilities. 
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.1. Xu is with the Simons Institute for the Theory of Computing, University of California, Berkeley, Berkeley, CA, 
jiamingxuSberkeley.edu. 
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• Weak recovery (weak consistency). If the average degree is 11(1), one can hope for misclas- 
sifying only o(n) vertices with high probability, which is known as weak recovery or weak 
consistency. In the setting with two approximately equal-sized clusters and a = c, it is shown 
in [45, 39] that weak recovery is possible if and only (a — 6)^/ {a + b) ^ oo. 

• Correlated recovery (non-trivial detection). If the average degree is 0(1), exact recovery or 
weak recovery becomes hopeless as the resulting graph under the stochastic block model 
will have at least a constant fraction of isolated vertices. Moreover, it is easy to see that 
even vertices with constant degree cannot be labeled accurately given all the other vertices’ 
labels are revealed. Thus one goal in the sparse graph regime is to hnd a partition positively 
correlated with the true one (up to a permutation of cluster indices), which is also called 
non-trivial detection. In the setting with two approximately equal-sized clusters and a = c, it 
was first conjectured in [14] and later proven in [40, 37, 30] that correlated recovery is feasible 
if and only if (a — 6)^ > 2(a -|- 6). A spectral method based on the non-backtracking matrix 
is recently shown to achieve the sharp threshold in [7]. 

In practice, one may be interested in the finer question of how many vertices that will be mis- 
classified on expectation or with high probability. In the two equal-sized clusters setting, previous 
results on exact recovery, weak recovery, and correlated recovery provide conditions under which 
the minimum fraction of misclassified vertices on average is o(l/n), o(l), and strictly smaller than 
1/2, respectively. By assuming (a — h)‘^/{a + b) ^ oo, recent work [47, 19] showd that the ex¬ 
pected misclassified fraction decays to zero exponentially fast and gives a sharp characterization of 
the decay exponent under a minimax framework. However, all these previous results do not shed 
light on the important question of when it is possible to misclassify only e fraction of vertices on 
expectation, for any finite e G (0,1/2). To the best of our knowledge, it is an open problem to find 
a closed-form expression of the expected misclassihed fraction in terms of the model parameters. 
In this paper, we give such a simple formula in the special case of two approximately equal-sized 
clusters. Specifically, suppose that 

a = b+Vbfi, c = b + Vbu, 6 —)■ oo, b = (1) 


for two hxed constants v. We further assume that fi ^ v so that the vertex degrees are statistically 
correlated with the cluster structure, and hence the name of the degree-correlated stochastic block 
model. We show that the minimum fraction of misclassified vertices on average is given by Q{\/v*), 
where Q{x) is the Q-function for standard normal, v* is the unique fixed point of u = ^ —h 

E [tanh(u -|- ^/vZ)], and Z is standard normal. Moreover, the minimum expected misclassified 
fraction can be attained by a local algorithm, namely belief propagation (BP) algorithm (See 
Algorithm 1), in time 0(n6^). The local belief propagation algorithm can be viewed as an iterative 
algorithm which improves on the misclassified fraction on average step by step; running belief 
propagation for one iteration reduces to the simple thresholding algorithm based on vertex degrees. 
It is crucial to assume ti ^ v for the above results to hold, otherwise it is well-known (see e.g. [26]) 
that no local algorithm can even achieve the non-trivial detection. Nevertheless, under a slightly 
stronger assumption that 6 —>■ oo and b = o(logn), we show that if /r = with |/i| > 2, local belief 
propagation combined with a global algorithm capable of non-trivial detection when ]//] > 2, attains 
the minimum expected misclassified fraction Q(\/h) in polynomial-time, where v is the largest fixed 
point of u = [tanh(u -|- \/vZ)] . 

When the clusters sizes are unbalanced, i.e., one cluster is of size approximately pn for p G 
(0,1/2), we give a lower bound on the minimum expected misclassified fraction, and an upper 
bound attained by the local belief propagation algorithm. However, we are unable to prove that 
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the upper bound matches the lower bound. In fact, numerical experiments suggest that there exists 
a gap between the upper and lower bound when the cluster sizes are very unbalanced, i.e., p is close 
to 0. 

Our proofs are mainly based on two useful techniques introduced in previous work. First, in 
the regime (1), the observed graph is locally tree-like, so we connect the cluster recovery problem 
to reconstruction problems on trees, and for the tree problems, the optimal estimator can be 
computed by belief propagation algorithm. Such connection has been investigated before in [40, 
36, 41]. Second, we characterize the density evolution of belief propagation on trees with Gaussian 
approximations, and as a result, we get a recursion with the largest fixed point corresponding 
to a lower bound on the minimum expected misclassified fraction, and the smallest fixed point 
correspond to the expected misclassified fraction attained by the local belief propagation algorithm. 
Density evolution has been widely used for the analysis of multiuser detection [35] and sparse graph 
codes [43, 32], and more recently has been introduced for the analysis of finding a single community 
in a sparse graph [34]. As a final piece, we prove that in the balanced cluster case, the recursion 
has a unique fixed point using the ideas of symmetric random variables [43, 33] and the first-order 
stochastic dominance, thus establishing the tightness of the lower bound and the optimality of the 
local BP simultaneously. 

We point out that local algorithm by itself is a thriving research area (see [29, 23, 18] and the 
references therein). Intuitively, local algorithms are one type of algorithms that make decision for 
each vertex just based on the neighborhood of small radius around the vertex; these algorithms are 
by design easy to run in a distributed fashion. Under the context of community detection, local 
algorithms determine which community each vertex lies in just based on the local neighborhood 
around each vertex (see [34] for a formal definition). Recent work [41] shows that with the aid of 
extra noisy label information on cluster structure, the local algorithms can be optimal in minimizing 
the expected misclassified fraction in the stochastic block model. In comparison, we show that when 
the vertex degrees are correlated with the cluster structure, the local algorithms can be optimal 
even without the extra noisy label information. 

In closing, we compare our results with the recent results in [34, 22], which studied the problem 
of finding a single community of size pn in a sparse graph. When u = 0, i.e. b = c, the stochastic 
block model considered in this paper, specializes to the single community model studied in [34], 
and the recursion of density evolution derived in this paper reduces to the recursion derived in [34, 
Eq. (36)]. It is shown in [34, 22] that the local algorithm is strictly suboptimal comparing to the 
global exhaustive search when p —>■ 0. In contrast, we show that if p = 1/2, the local algorithm is 
optimal in minimizing the expected fraction of misclassified vertices as long as p ^ u, and give a 
sharp characterization of the minimum expected misclassified fraction. 

Parallel Independent Work The problem of cluster recovery under the degree-correlated stochas¬ 
tic block model with multiple clusters was independently studied in [49]. Based on the cavity 
method and numerical simulations, it is shown that with at most four clusters of unequal sizes but 
same in and out degrees, the non-trivial detection threshold phenomenon disappears, making the 
minimum fraction of misclassified vertices on average a continuous function of model parameters. 
In comparison, in the regime (I) with two equal-sized clusters and p ^ v, we give a more precise 
answer, showing that the fraction of misclassihed vertices on average is Q{\/v*), where v* is the 
unique fixed point of u = E [tanh(u -|- -y/uZ)]. Moreover, it is shown in [49] that with 

more than four clusters of unequal sizes, there exists a regime where two stable fixed points coexist, 
with the smaller one corresponding to the performance of local belief propagation, and the larger 
one corresponding to the performance of belief propagation initialized based on the true cluster 
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structure. We find that the same phenomenon also happens in the case of two clusters with very 

unbalanced sizes and different in and out degrees (See Section 2.4 for details). 

We recently became aware of the work [16], who studied the problem of cluster recovery under 

the stochastic block model in the symmetric setting with two equal-sized clusters and a = c. By 

assuming that 2 {a+b)(i-(a+b)/ 2 n) for a fixed constant ^ and (a -|- f))(l — {a + b)/2n) —)■ oo, 

a sharp characterization of the per-vertex mutual information between the vertex labels and the 
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graph is given in terms of fj, and v, where v is the largest fixed point of v = [tanh(u -|- ^/vZ)]. 
In comparison, we show that the minimum fraction of vertices misclassified on expectation is given 
by Q{\/^) and it is attainable in polynomial time with an additional technical assumption that 
b = o(logn). Interestingly, the point (a) of Lemma 6.1 in [16] is a special case of Lemma 4.3 with 
p = 1/2 in our paper. The proof of Lemma 6.1 given in [16] and the proof of Lemma 4.3 given 
in this paper are similar: both used the ideas of symmetric random variables [43, 33]. One slight 
difference is that to prove the concavity of the mapping in the recursion when p = 1 / 2 , we used 
the hrst-order stochastic dominance, while [16] computes the second-order derivative. 

2 Model and Main Results 

We consider the binary stochastic block model with n vertices partitioned into two clusters, where 
each vertex is independently assigned into the first cluster with probability p G ( 0 , 1 ) and the 
second cluster with probability p = 1 — p} Each pair of vertices is connected independently with 
probability a/n if two vertices are in the first cluster, with probability c/n if they are in the second 
cluster, and with probability h/n if they are in two different clusters. Let G = {V,E) denote the 
observed graph and A denote the adjacency matrix of the graph G. Let a denote the underlying 
vertex labeling such that ai = + if vertex i is in the first cluster and (Ji = — otherwise. The model 
parameters {p, a, 6, c} are assumed to be known, and the goal is to estimate the vertex labeling o 
from the observation of G. More precisely, we have the following dehnition. 

Definition 2.1. The reconstruction problem on the graph is the problem of inferring a from the 
observation ofG. The expected fraction of vertices miselassified by an estimator d is given by 

1 ” 

Pg{^) =. ( 2 ) 

i=l 

Let p*Q denote the minimum expected misclassified fraction among all possible estimators based on 
G. 


The optimal estimator in minimizing the error probability P {uj 7 ^ Uj} is the maximum a pos¬ 
terior (MAP) estimator, which is given by 2 x — 1, and the minimum error 

probability is given by ^ — ^E [|E{crj = -|-|G} — E{<Ti = —[G} []. Hence, the minimum expected 
misclassified fraction Pq is given by 

1 1 " 

Pg = 2 - 2n = +1^^ - PK = -|G} |] 

i=l 

= ^ - = +1^} - E{ct, = -|G} |], (3) 

^Notice that the cluster sizes are random and concentrate on pn and (1 — p)n. A slightly different model assumes 
that the vertices are partitioned into two clusters of deterministic sizes, exactly given by pn and (1 — p)n. The two 
models behave similarly, but for ease of analysis, we focus on the random cluster size model in this paper. 
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where the second equality holds due to the symmetry among vertices. In the special case with 
p = 1/2 and a = c, the two clusters are symmetric; thus Pq = 1/2 and one can only hope to 
estimate cr up to a global flip of sign. In general, computing the MAP estimator is computationally 
intractable and it is unclear whether the minimum expected misclassified fraction Pq can be achieved 
by some estimator computable in polynomial-time. 

Throughout this paper, we assume that p is fixed and focus on the regime (1). As the average 
degree is it is well-known that a local neighborhood of a vertex is a tree with high proba¬ 

bility. Thus, it is natural to study the local algorithms. More precisely, we consider a local belief 
propagation algorithm to approximate the MAP estimator in the next subsection. 

2.1 Local Belief Propagation Algorithm 

Our local belief propagation algorithm is given in Algorithm 1. Specifically, let di denote the set 
of neighbors of i, and F{x) = | log • Let = pa + pb and d- = pb + pc denote the 

expected vertex degree in the first and second cluster, respectively. Define the message transmitted 
from vertex i to vertex j at t-th iteration as 

= ^ (4) 

with initial conditions Ri^j = 0 for all i G [n] and j G di. Then we define the belief of vertex u at 
t-th iteration to be 

Ri = ~‘‘*2 + E R(R‘,~JJ' (5) 

l&du 


Algorithm 1 Belief propagation for cluster recovery 

1 : Input: n G N, p G (0,1), alb^cjh, adjacency matrix A G {0,and t G N. 

2: Initialize: Set Ri^j = 0 for all i G [n] and j G di. 

3: Run t — 1 iterations of message passing as in (4) to compute for all i G [n] and j G di. 
4: Compute R\ for all i G [n] as per (5). 

5: Return a^p with CTgp(f) = 2 x — 1, where p = ^ log 


As we will show in Section 3.1, the message passing as in (4) and (5) exactly computes the 
log likelihood ratio for a problem of inferring au on a suitably defined tree model with root u. 
Moreover, in the regime (1), there exists a coupling such that the local neighborhood of a fixed 
vertex u is the same as the tree model rooted at u with high probability. These two observations 
together suggest that is a good approximation of ^ log , and thus we can estimate au 

by truncating at the optimal threshold —cp = | log according to the MAP rule. 

We can see from (4) that in each BP iterations, each vertex i needs to compute \di\ outgoing 
messages. To this end, i can first compute R\ according to (5), and then subtract from 

R\ to get R\^j for every neighbor j of i. In this way, each BP iteration runs in time 0(m), where 
m is the total number of edges. Hence a^p can be computed in time 0{tm). 

Finally, notice that Algorithm 1 needs to know the parameters {p,ajb,cjb}. For the main 
results of this paper continue to hold, the values of the parameters are only needed to know up 
to o(l) additive errors. In fact, there exists a fully data-driven procedure to consistently estimate 
those parameters, see e.g., [21] [Appendix Bj. 
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2.2 Main Results 

The following theorem characterizes the expected fraction of vertices misclassified by as n —>■ oo; 

it also gives a lower bound on the minimum expected misclassified fraction as n ^ oo. Furthermore, 
in the case p = 1/2 and p i', S^pp achieves the lower bound as t —)■ oo after n —)■ oo. 

Theorem 2.2. Assume p G (0,1) is fixed and consider the regime (1). Let 

h{y) = E [tanh(u + \puZ + (/?)] , 

where Z ~ 1) and ^ log Let A = and 9 = —h , Define v and v 

to be the smallest and largest fixed point of 

V = 6 + Xh{v), 

respectivelifi. Define {vt : t > 0) recursively by vq = 0 and vt+i = 6 + \h{vt). Let Ugp denote 
the estimator given by Belief Propagation applied for t iterations, as defined in Algorithm 1. Then 
limi^.oo Vt = V, {pp - pv)"^/‘I <v<v< (p/i^ + pv^)l\, and 

^lim pg(S^p) = pa + (1 - p)Q , 

limintpj; > pQ + (1 - p)Q , 

where Q{x) = Moreover, if p = 1/2 and pfiv, then y = v = v*, and thus 

lim lim pci^Bp) = Pg = Q(v^)) 

t—^oo n^oo n^oo 

where v* is the unique fixed point of v = —h ^ E [tanh(u + \TvZ)] . 

If pp pv so that the vertex degrees are statistically correlated with the cluster structure, we 
have u > 0 and thus limt^oo linin->-oo Pg(S^bp) > min{p, 1 — p}. Hence, the local application of BP 
strictly outperforms the trivial estimator, which always guesses the label of all vertices to be +1 
A p > 1/2 and —lifp<l/2. In the balanced case p = 1/2, the local BP achieves the minimum 
expected misclassified fraction. Numerical experiments further indicate that the local BP is still 
optimal in the unbalanced case provided that p is not close to 0 or 1 ; however, we do not have a 
proof (See Section 2.4 for more discussions). 

If pp = pv, then u = 0 and thus 

pQ ^ 1 - p}- 

In this case, our local application of BP cannot do better than the trivial estimator. In fact, the 
local neighborhoods are statistically uncorrelated with the cluster structure, and one can further 
argue that no local algorithm can achieve non-trivial detection (see e.g. [26]). Although local 
algorithms are bound to fail, there might still exist some efficient global algorithms which achieve 
the minimum expected misclassified fraction. The following theorem shows that this is indeed the 
case when p = 1/2, p = v and b = o(logn). 

^The existence of fixed points of v i—>■ 0 + \h{v) follows from Brouwer’s fixed-point theorem and the fact that 
h{v) < 1. 
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Theorem 2.3. Assume p = 1/2, a = c = b + Vbp for some fixed constant p, and 6 —)• oo such that 
b = o(logn). For an estimator a based on graph G, define the fraction of vertices misclassified by 
a as 


0{d,CF) 


1 

— mm 
n 


. i=l 




i=l 




( 6 ) 


If \p\ > 2, then 


lim inf E [0 (ct, (t)] = Q ) (7) 

71 —>^CXZ) 3^ \ / 

where the infimum ranges over all possible estimators a based on graph G; v > 0 is the largest fixed 
2 

point of V = ^E [tanh(tJ + ^/vZ)] . Moreover, there is a polynomial-time estimator such that for 
every e > 0, it misclassifies at most Q (v^) — e fraction of vertices on expectation. 

In contrast to (2), the fraction of vertices misclassified by a is defined up to a global flip of signs 
of a in ( 6 ). This is because in the case p = 1/2 and a = c, due to the symmetry between + and 
—, a and —a have the same distribution conditional on graph G. Thus, it is impossible to reliably 
estimate the sign of a based on graph G. 

Note that \p\ = 2 corresponds to the Kesten-Stigum bound [27]. It is shown in [40] that if 

< 2 , correlated recovery is impossible and thus the minimum expected misclassihed fraction 

is 0; Remarkably, [30, 37, 7] prove that correlated recovery is efficiently achievable if |/u| > 2. 

Our results further show that in this case with 6—7-00 and 6 = o(logn), the minimum expected 

misclassified fraction is Q {V^) and it can also be attained in polynomial-time. The proof of 

Theorem 2.3 is mainly based on two observations. First, it is shown in [36] that the local BP is 

able to improve a clustering that is slightly better than a random guess to achieve the minimum 

expected misclassihed fraction if \p\ > C for a universal constant C > 0. Second, we hnd that if 

2 

> 2, the recursion v = ^E [tanh(u -|- \/vZ)] derived in the density evolution analysis of local 
BP has only two hxed points: 0 and u > 0, where 0 is unstable and v is stable. These two results 
together establish that if \p\ > 2, then running the local BP for t iterations with a correlated 
initialization provided by a non-trivial detection algorithm is able to attain the minimum expected 
misclassihed fraction Q (\/h) as t —>■ 00. 

2.3 Proof Ideas 

The proof of Theorem 2.2 is based on two useful tools. First, we connect the cluster recovery 
problem to the reconstruction problem on trees. Second, we use the density evolution with Gaus¬ 
sian approximations to give a sharp characterization of error probabilities of tree reconstruction 
problems, in terms of hxed points of a recursion. 

• To bound from below the minimum expected misclassihed fraction, we bound the error prob¬ 
ability of inferring au for a specihc vertex u. Following [40], we consider an oracle estimator, 
which in addition to the graph structure, the exact labels of all vertices at distance exactly 
t from u are also revealed. As in [40], it is possible to show that the best estimator in this 
case is given by BP for t levels initialized using the exact labels at distance t. In contrast, the 
expected fraction of vertices misclassihed by the local BP algorithm approximately equals to 
the error probability of inferring solely based on the neighborhood of vertex u of radius t, 
without having access to the exact labels of vertices at distance t. 
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• We characterize the density evolution of the local BP and the BP lower bound using Gaussian 
approximations, and get a recursion with the largest fixed point corresponding to the BP 
lower bound, and the smallest fixed point corresponding to the expected fraction of vertices 
misclassified by the local BP as t —)• oo. In the balanced cluster cases, we further show that 
there is a unique fixed point for the recursion, and thus the BP lower bound matches the 
expected fraction of vertices misclassified by the local BP. 

2.4 Numerical Experiments and Open Problems 



Figure 1: Numerical calculations of h'{v) (y axis) versus v G [0, 6 ] (x axis) with different p. It shows 
that h{v) is concave when p > 0.2 and h(v) becomes convex for v small when p < 0 . 1 . 

In the case with p = lj2 and p ^ u, we show that v = 9 + \h{v) has a unique fixed point and 
thus the local BP is optimal; the key idea is to prove that h{v) is concave in this case. Numerical 
calculations, depicted in Fig. 1, show that h{v) is still concave if p > 0.2, suggesting that the local 
BP is still optimal in roughly balanced cluster size cases. However, h(v) becomes convex for v small 
when p < 0 . 1 . 

It is intriguing to investigate when v = 0 + \h{v) has a unique fixed point. If p = 0.01, numerical 
experiments, depicted in Fig. 2, shows that v = 6 + Xh{v) may have multiple fixed points, suggesting 
that the local BP may be suboptimal. However, in the case with p = v and p 7 ^ 1/2, numerical 
experiments indicate that there is always a unique fixed point. 

Conjecture 2.4. If p = u, then v = 9 + Xh{v) has a unique fixed point for all p G (0, l/2)U(l/2,1). 

Notice that in the case with p = u and p = l/2, 0 = 0, A = p^/4, and h{v) = E [tanh(u + ^/vZ)]. 
We have shown in Lemma 4.3 that h is non-decreasing, concave, and hmt,_^o h^(u) = 1. Thus if 
IpI < 2 , there is a unique fixed point at zero, which is stable; if |p| > 2 , there are two fixed points: 
one is zero which is unstable and the other is u > 0 which is stable. 

2.5 Notation and Organization of the Paper 

For any positive integer n, let [n] = {1,..., n}. For any set T C [n], let |T| denote its cardinality 
and denote its complement. We use standard big O notations, e.g., for any sequences {a„} and 
{hn}, CLn = Q{bn) if there is an absolute constant c > 0 such that 1/c < anjhn < c. Let Bern(p) 













Figure 2: The plot of 0 + Xh{v) (y axis) versus v (x axis) in the case p = 0.01. Left frame: /r = 50 
and v = 0] right frame: /r = 40 and v = 1.5. It shows that v = 6 + Xh{v) has three fixed points: The 
smallest one is v corresponding to the performance of local BP; the largest one is v corresponding 
the lower bound on expected misclassified fraction; the intermediate one is unstable. 


denote the Bernoulli distribution with mean p and Binom(n,p) denote the binomial distribution 
with n trials and success probability p. All logarithms are natural and we use the convention 
OlogO = 0. We say a sequence of events Sn holds with high probability if F{Sn} —>■ 1. 

The rest of this paper is organized as follows. Section 3 focuses on the inference problems on 
the tree model. The analysis of the belief propagation algorithm on trees and the proofs of our 
main theorems are given in Section 4. The proofs of auxiliary lemmas can be found in Appendix A. 

3 Inference Problems on Gallon-Watson Tree Model 

In this section, we first introduce the inference problems on Galton-Watson trees, and then relate 
it to the cluster recovery problem under the stochastic block model. 

Definition 3.1. For a vertex u, we denote by {Tu,t) the following Poisson two-type branching 
process tree rooted at u, where t is a {±} labeling of the vertices ofT. Let = + with probability 
p and Tu = —1 with probability p, where p = 1 — p. Now recursively for each vertex i in Tu, given 
its label Ti = +, i will have Li ~ Pois(pa) children j with tj = + and Mi ~ Pois(p6) children j with 
Tj = given its label Ti = —1, i will have Li ~ Pois(p6) children j with tj = + and Mi ~ Pois(pc) 
children j with tj = —. 

For any vertex z in T^, let T/ denote the subtree of Tu of depth t rooted at vertex i, and dT^ 
denote the set of vertices at the boundary of T/. With a bit abuse of notation, let ta denote the 
vector consisting of labels of vertices in A, where A could be either a set of vertices or a subgraph 
in Tu- We hrst consider the problem of estimating the label of root u given the observation of Tf 
and Tgxt- Notice that the labels of vertices in are not observed. 

Definition 3.2. The detection problem on the tree with exact information at the boundary is the 
problem of inferring Tu from the observation ofTu and Tg^t. The error probability for an estimator 
Tu{Tl,Tgrpt) is defined by 

Pp{tu) = pF{fu = -\tu = +} + pF{tu = +\tu = -} . 

Let plft denote the minimum error probability among all estimators based on T* and Tgj^t. 
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The optimal estimator in minimizing is the maximum a posterior (MAP) estimator, which 
can be expressed in terms of log likelihood ratio: 


TmL = 2 X l{At^>-y 3 } - 1, 


where 



for all i in T„, and ^ \ log Thus, the minimum error probability p^t is given by 

PTt = \- [|lP’{'r« = +\Ti,TQTt} -P{r„ = -\TI,tqti] |] . (8) 

We then consider the problem of estimating given observation of T^. Notice that in this case 
the true labels of vertices in T* are not observed. 

Definition 3.3. The detection problem on the tree is the problem of inferring Tu from the obser¬ 
vation ofT^. The error probability for an estimator fu{Tff) is defined by 

dT^iTu) = P^{tu = -\tu = +} + P^{Tu = +\^u = -} ■ 

Let gift denote the minimum error probability among all estimators based on T*. 

In passing, we remark that the only difference between Definition 3.2 and Definition 3.3 is that 
the exact labels at the boundary of the tree is revealed to estimators in the former and hidden in 
the latter. The optimal estimator in minimizing qip, is the maximum a posterior (MAP) estimator, 
which can be expressed in terms of the log likelihood ratio: 

TviAP = 2 X - 1, 


where 


Y't A 

^ i 


1 , 


±1 


for all i in Tu, and p = ^ log The minimum error probability q^t is given by 


qh = l- 2® [|IP {W = +\n} - IP {r. = -\T^} |] , 


(9) 


If (i_|_ = d-, then the distribution of Tf conditional on = + is the same as that conditional 
on Tu = —. Thus, = 0 and the MAP estimator reduces to the trivial estimator, which always 
guesses the label to be + if p > 1/2 and — if p < 1/2, and q^t = min{p, p}. If d+ / d-, then 
becomes statistically correlated with r^, and it is possible to do better than the trivial estimator 
based on Tfi 

For the tree model, the likelihoods can be computed exactly via a belief propagation algorithm. 
The following lemma gives a recursive formula to compute A* and Tb no approximations are needed. 
Let di denote the set of children of vertex i. 
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Lemma 3.4. Recall F{x) = ^ log ■ For t > 0, 


Af' = ^ F(A<). 

(10) 

jedi 


r‘« - + E F(r‘), 

(11) 

jedi 


with A? = oo if Ti = + and A? = —oo if Ti = —; F? = 0 for all i. 



3.1 Connection between the Graph Problem and Tree Problems 

For the reconstruction problem on graph, recall that pg^bv) denote the expected fraction of 
vertices misclassihed by CTgp as per (2); p*Q is the minimum expected misclassified fraction. For the 
reconstruction problems on tree, recall that p’^t is the minimum error probability of estimating 
based on and Tg^t as per ( 8 ); q^t is minimum error probability of estimating based on 
as per (9). In this section, we show that in the limit n —)• oo, Pg{^bp) equals to q^t, and p*Q is 
bounded by ptpt from the below for any t > 1. Notice that q^t and depend on n only through 
the parameters a, b, and c. 

A key ingredient is to show that G is locally tree-like with high probability in the regime 
b = n"d). Let denote the subgraph of G induced by vertices whose distance to u is at most t 
and let denote the set of vertices whose distance from u is precisely t. In the following, for 
ease of notation, we write T* as T* and G^ as G* when there is no ambiguity. With a bit abuse of 
notation, let a a denote the vector consisting of labels of vertices in A, where A could be either a 
set of vertices or a subgraph in G. The following lemma proved in [40] shows that we can construct 
a coupling such that {G^^aQt) = {T^,Txt) with probability converging to 1 when 6 * = 

Lemma 3.5. For t = t{n) such that b^ = there exists a coupling between {G,a) and (T, r) 

such that {G^,aGt) = {T^,Trpt) with probability converging to 1 . 

Suppose that {G^^aQt) = {T^,TTt) holds, then by comparing BP iterations (4) and (5) with the 
recursions of log likelihood ratio F^ given in (11), we find that Rl^ exactly equals to r(j, i.e., the BP 
algorithm defined in Algorithm 1 exactly computes the log likelihood ratio for the tree model. 
Building upon this intuition, the following lemma shows that Pg{^bp) equals to q^t as n —>■ oo. 

Lemma 3.6. For t = t{n) such that 6 * = 

lim Ipci^Bp) -qTt\=0. 

n^oo 

Proof. In view of Lemma 3.5, we can construct a coupling such that {G^,aGt) = {T^,Txt) with 
probability converging to 1. On the event {G^,aGt) = (T^,Txt), we have that = F(j. Hence, 

Pg{(^bp) — + o(l)) (12) 

where o(l) term comes from the coupling error. □ 

The following lemma shows that Pq is lower bounded by plft as n —>■ oo. 

Lemma 3.7. For t = t{n) such that 6 * = 

lim sup (pg — Pxt) > 0 . 

n^oo 
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We pause a while to give some intuition behind the lemma. To lower bound p^, it suffices to 
lower bound the error probability of estimating au for a given vertex u based on graph G. To this 
end, we consider an oracle estimator, which in addition to the graph structure, the exact labels of 
all vertices at distance exactly t from u are also revealed. We further show that once the exact 
labels at distance t are conditioned, au becomes asymptotically independent of the labels of all 
vertices at distance larger than t from u. Hence, effectively the oracle estimator is equivalent to 
the MAP estimator solely based on the graph structure in and the exact labels at distance t. 
By the coupling lemma, is a tree with high probability, and thus the error probability of the 
oracle estimator asymptotically equals to p^t ■ 

4 Gaussian Density Evolution 

In the previous subsection, we have argued that in the limit n —)> oo, Pg(^bp) equals to q^t, and 
Pq is bounded by from the below. In this section, we analyze recursions (10) and (11) using 
density evolution analysis with Gaussian approximations, and derive simple formulas for p^t and 
q^t in the limit n —)> oo. Afterwards, we give the proof of Theorem 2.2. 

Notice that T* is a function of T* alone. Since the subtrees {T/jieau conditional on are 
independent and identically distributed, conditional on Tu are also independent and iden¬ 

tically distributed. Thus, in view of the recursion (11), can be viewed as a sum of i.i.d. random 
variables. When the expected degree of u tends to infinity, due to the central limit theorem, we 
expect that the distribution of conditional on Tu is approximately Gaussian. Moreover, the 
construction of the subtree T/ conditional on n is the same as the construction of T* conditional on 
Tu- Therefore, for any i G du, the distribution of T* conditional on r* is the same as the distribution 
of conditional on Tu- Similar conclusions hold for A) as well. 

Let Z\. (Hj.) denote a random variable that has the same distribution as (A(j) conditional 
on Tu = ±. The following lemma provides expressions of the mean and variance of and Z^_. 

Recall that A = and 9 = 

Lemma 4.1. For all t > 0, 

E [z4+^] = ±0 ± AE [tanh(Z^ + tp)] + 0(6“^/^), (13) 

var (Z4+^) = 0 + AE [tanh(Z^ + ip)] + (14) 


Recall that {vt : t > 0) satisfies vo = 0 and 

vt+i = 9 + Xh{vt) = 0 -I- AE [tanh(ut -|- ^/^HZ + p] , 
where Z ~ AA(0, 1). Similarly, define (wt : t > 1) hy wi = 9 + X = and 

wt+i = 9 + Xh{wt) = 0 -I- AE [tanh(t(;t -|- -|- p] . 

The following lemma shows that for any fixed t > 0, and are approximately Gaussian. 

Lemma 4.2. For any t >0, as n ^ oo, 


sup 

X 


< xl -P{Z < x} 




(15) 
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Similarly, for any t>l, as n —)• oo, 


sup 

X 


V^i 


< x \ —¥ {Z < x} 




(16) 


Before proving Theorem 2.2, we also need a key lemma, which shows that h is continuous and 
non-decreasing, and h is concave if (^ = 0. 


Lemma 4.3. h{v) is continuous on [0,cx)) and for v G (0,-|-oo), 

h'{v) = E [(l — tanh(r; -I- y/vZ + ip)) (l — tanh^(r; -|- ^/vZ -|- </?))] > 0. 
Furthermore, if p = 0, then h'{v) > h'{w) for Q < v < w < oo. 

Finally, we are ready to prove Theorem 2.2 based on Lemma 4.2 and Lemma 4.3. 
Proof of Theorem 2.2. In view of Lemma 4.2, 

‘vt-p 


(17) 


lim P > -p\tu = -} = Q 

r» ^ rv~i '' 




lim P {r(, < -p\tu = +] = q( . 

n^oo Y / 

Hence, it follows from Lemma 3.6 that 

lim pci^Bp) = 1™ 


Q 


vt + U 


Vvi 


where U = —p with probability 1 — p and U = p with probability p. 
We prove that vt+i > vt for t > 0 by induction. Recall that 

W = 0 < (pp - pv)^ = Q F Xh{vo) = vi. 


Suppose vt+i > vt holds; we shall show the claim also holds for t -|- 1. In particular, since h is 
continuous on [0, oo) and differential on (0, oo), it follows from the mean value theorem that 

vt +2 - Vt+I = A {h{vt+i) - h{vt)) = \h'{x){vt+i - vt), 


for some x G {vt,vt+i)- Lemma 4.3 implies that h'{x) > 0 for x G (0, oo), it follows that vt +2 > vt+i- 
Hence, vt is non-decreasing in t. Next we argue that vt < v for all t > 0 by induction, where v is 
the smallest fixed point of v = 6 + Xh{v). For the base case, vq = 0 < v. If vt < v, then by the 
monotonicity of h, vt+i = 0 + \h{vt) < 6 + \h{y) = v. Hence, vt < v and thus limi_^oo vt < v. By the 
continuity of h, limi_>.oo vt is also a fixed point of v = 6 + Xh{v), and consequently limt^oo vt = v. 
Therefore, 


lim lim pg(?bp) 

t—¥oo n^oo 


lim lim qiXt = E 

t^oo n^oo 


Q 


v + U 


Next, we prove the claim for Pq. In view of Lemma 4.2, 

Jim P{Ai > = -} = Q (^^) . 

Jim P{Ai < = +} = Q (f^) ■ 
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Hence, it follows from Lemma 3.7 that 


lim inf p*Q > lim p%t = E 

n^oo n—^oo 


Q 


(wt + UY 


Recall that wi = 9 + X > wt- By the same argument of proving vt is non-decreasing, one can show 
that Wt is non-increasing in t. Also, by the same argument of proving vt is upper bounded by u, 
one can show that wt is lower bounded by v, where v is the largest fixed point of v = 6 + Xh{v). 
Thus, lim^^oo wt = v and v<wi = 9 + X = {pp? + pz^^)/4. Therefore, 


lim inf pL 
n^oo 


lim lim inf p*Q > lim lim pY = E 

t^oo n^oo t^oo n^oo 


Q 


v + U 


If p = 0 and /X 7 ^ z/, then ui > 0 and Lemma 4.3 shows that h!{v) > h'{w) for all 0 < u < tc < oo. 

Since ui = 6* -|- Xh{0) > 0 and v = 9 + Xh{v), it must hold that Xh'{v) < 1. Thus \h'{v) < 1 for 

all u > u and consequently 9 + Xh{v) < v for all v > v. Hence, v = v = v*, where v* is the unique 
fixed point of u = E [tanh(u + \YjZ)] . Therefore, 

lim inf pL > lim lim pY = lim lim qY = lim lim pG{a^p) = Q{'\/v*). 

n^oo t^oo n^oo t^oo n^oo t^oo n^oo 

Since Pq is the minimum expected misclassified fraction, it also holds that limsup^^o^pg < 
lim„_,.oo Pg(S^bp) all t > 1 and consequently 

lim sup Pq < lim lim pQ(agp). 

t^oo n^oo 


Combing the last two displayed equations gives that 

lim Pq = lim lim pG{app) = Q{Yv*). 

n^oo t^oo n^oo 


□ 


4.1 Degree-Uncorrelated Case 

As remarked in Section 2 . 2 , in the case pp = pv, the vertex degrees are statistically uncorrelated 
with the cluster structure, and no local algorithms is capable of non-trivial detection. However, 
it is still possible that local algorithms combined with some efficient global algorithms achieve the 
minimum expected misclassified fraction. In this subsection, we show that it is indeed the case, 
if p = 1/2, p = V with \p\ > 2, and b = o(logn). The algorithm as described in Algorithm 2 is 
introduced in [36] and we give the full description for completeness. 

Notice that Algorithm 2 runs in time polynomial in n. The algorithm consists of two main steps. 
First, we apply some global algorithm to get a correlated clustering when \p\ > 2, for example, 
the algorithm studied in [38]. Then, we apply the local BP algorithm to boost the correlated 
clustering in the first step to achieve the minimum expected misclassified fraction. To ensure the 
first and second step are independent of each other, for each vertex u, we first withhold the {t — 1 )- 
local neighborhood of u, and then apply the global detection algorithms on the reduced set of 
vertices. The clustering on the reduced set of vertices is used as the initialization to the local belief 
propagation algorithm running on the withheld {t — l)-local neighborhood of u. In this way, the 
outcome of the global detection algorithm based on the reduced set of vertices is independent of 
the edges between the withheld t-local neighborhood of u and the reduced set of vertices, as well 
as the edges within the withheld set. 
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Algorithm 2 Local Belief propagation Plus Correlated Recovery 

1: Input: n G N, a = c, 6 > 0, p = 1/2, adjacency matrix A G {0, t e N. 

2: Take ?7 C P to be a random subset of size [\/nJ. Let rt* G ?7 be a random vertex in U with at 
least 2 log&Zfe) neighbors in V\U. 

3: For u G V\U do 

1. Run a polynomial-time estimator capable of correlated recovery on the subgraph induced 
by vertices not in and U, and let VF+ and W~ denote the output of the partition. 

2. Relabel VF+ and W~ such that if a > 6, then u* has more neighbors in than W ~; 
otherwise, u* has more neighbors in W~ than Wj~. Let denote the fraction of vertices 
misclassified by the partition (W^,W~). 

3. For all i G and j G define Ri^j = \ log if i G W^, and Ri^j = 

4. Run t — 1 iterations of message passing as in (4) to compute for all tt’s neighbors i . 

5. Compute as per (5), and let ? 3 p(ri) = -|- if > —y?; otherwise let a^p{u) = —. 

4: For u £ U, let app{u) equal to -|- or — uniformly at random. Output 


There is also a subtle issue to overcome. We run the global detection algorithm once for each 
vertex, and the global detection algorithm cannot reliably estimate the sign of the true a due to 
the symmetry between -|- and —. Therefore, different runs of the global detection algorithm may 
have different estimates of the sign of a. We need a way to coordinate different runs of the global 
detection algorithms to have the same estimate of the sign of a. To this end, a small random subset 
U is reserved and a vertex of high degree tt* in U is served as an anchor. In every runs of the global 
detection algorithms, we relabel the partition if necessary, to ensure that n* will always have more 
neighbors with estimated -|- labels than neighbors with estimated — labels if a > 6, and the other 
way around if a < 6. 

Finally, we caution the reader that in addition to the model parameters a, 5, after each run of 
the global detection algorithm, the algorithm requires knowing which is the fraction of vertices 
misclassified by the partition (VF+, W ~). In the main analysis, we assume the exact value of Uu is 
known for simplicity. One can check that only an estimator = «„ -|- o(l) with high probability 
is needed for Theorem 2.3 to hold. In Appendix B, we give an efficient and data-driven procedure 
to construct such a consistent estimate of Uu- 

Next, in the limit n —)> oo, we give a lower bound on the minimum expected misclassified 
fraction, and an upper bound attainable by CTgp. Then we show that the lower and upper bound 
match with each other in the double limit, where first n —?• oo and then t —?• oo. 

Recall that the fraction of vertices misclassified by a is defined up to a global flip of signs of 
a as in (6). The following lemma shows that the minimum expected misclassified fraction is still 
lower bounded by p^t- Its proof is very similar to the proof of Lemma 3.7. The key new challenge 
is that E [0{a,a)] does not reduce to the error probability of estimating cr„ for a given vertex u 
directly. 
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Lemma 4.4. For t = t{n) such that 6* = n°^^\ 

limsup (inf E [0(cr, ct)] — p^tj > 0, 

n^oo \ / 

where p^t is defined in (8) under the tree model with p = ll2 and a = c defined in Definition 3.1. 

In the following, we relate the expected fraction of vertices misclassified by agp as defined in 
Algorithm 2 to an estimation problem on the tree model. In particular, consider the tree model 
with p = 1/2 and o = c as defined in Definition 3.1. Fix an a G [0,1/2]. Let = Tj with probability 
1 — a and = —r, with probability for a, independently for all i G Tu. Then is a a-noisy 
version of tqt^. Let denote the minimum error probability of inferring based on T* and 
TpTD The optimal estimator achieving qx^,a is the MAP estimator given by 

tmap = 2 X - 1, 


where 


r‘ = 2iog 


1 T|r/,r9'pt|ru —+l| 


= -l} 

for all i in T^. The minimum error probability qx^ a is given by 


< -F\ru = + [■ + <j T^ > -lp\tu = - 

= - - -E 

2 2 


} + ^E{rl>-¥.|r, = -} 

-Tu = +|T*,raT*|} -P{r„ = -\Tl,FQxi] |] , 


It follows from the definition that qxpa is non-decreasing in a. Also, 5 t*,q = Px^ if « = 0 and 
= dx^ if ^ ~ 1/2,. The following lemma shows that the fraction of vertices misclassified by 
?gp as defined in Algorithm 2 is asymptotically no larger than a foi" some a G [0,1/2). 

Lemma 4.5. There exists an a € [0,1/2) such that for t = t{n) with 6* = re°T)^ 


limsup (E [C)(cr, Upp)] — qx^,a) < 0- 

n^oo 


The following lemma gives a characterization of the distribution of T^ based on the density 
evolution with Gaussian approximations. 

Lemma 4.6. Let and Z_i denote a random variable that has the same distribution as T^ 
conditioning on Xu = + and Tu = —, respectively. For any t > 1, as n —)• oo, 


sup 

X 


Z^TUt 1 

<x}-F{Z<x} 

V^t 


0(6-V2), 


( 18 ) 


where ui = ^ and ut+i = i^E [tanh(nt + ^/utZ)'j. 

We are ready to prove Theorem 2.3 by combing Lemma 4.4, Lemma 4.5, and Lemma 4.6. 
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Proof of Theorem 2.3. In view of Lemma 4.6, for t >0, 

lim > 0\tu = -| = lim pjf^ < 0|r„ = +| = Q{y/ut), 

n^oo K ) n^oo K. ) 

It follows from Lemma 4.5 that there exists an a G [0,1/2) such that 

limsupE [0{a,a^p)] < lim „ = Q{y/^t)- 

n^oo n—¥oo 

Let h{v) = E [tanh(u + ^/vZ)]. In view of Lemma 4.3, h is non-decreasing and concave in [0, oo), and 

lim^_,.o h'{v) = 1. Notice that /i(0) = 0, and thus 0 is a fixed point of u = ^h{v). Moreover, by the 

mean value theorem, for u > 0, h{v) = h{0) + h'{ff)v for some ^ G (0,u). Thus ^h{v) = ^h'{ff)v. 

By the assumption that /r > 2, and lim„_^o h'{v) = 1, it follows that there exists a u* > 0 such that 
2 ~ ^ 2 ~ 

^h{v) > V for all v G (0, u*). Furthermore, h{v) < 1 and hence ^h{v) < v for all v sufficiently 

large. Since h is continuous, v = ^h{v) must have nonzero fixed points. Let v denote the smallest 

2 -—' 2 ~ ~ 

nonzero fixed point. Then F > 0, ^h{v) > v for all v G (0,1;), and ^h'(v) < 1. Because h is 

concave, h'{v) < h'iv) for all v > v. Thus ^h{v) > v for all v > v. Therefore, v is the unique 
nonzero fixed point and also the largest fixed point. It follows that if ui < v, then {ut} is a non¬ 
decreasing sequence upper bounded by v. If ui > v, then {ut} is a non-increasing sequence lower 
bounded by v. Since ui > 0, it follows that lim^^oo ut = v- Hence, 

limsupinf E [0((T, ct)] < lim limsupE rO(cr, Ugp)l < lim lim 5T*a = Q(V^)- (19) 

n—>-03 t—¥00 t—¥oo rL—¥00 

It follows from Theorem 2.2 and Lemma 4.4 that 

liminf inf E [C)(cj, a)] > lim lim = (5 (\/l). (20) 

n^oo a t^oo n^oo 

The theorem follows by combining the last two displayed equations. □ 
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A Additional Proofs 

A.l Proof of Lemma 3.4 

By definition. A? = +oo if Tj = + and A? = —oo if Tj = —, and P? = 0 for all i. We prove the claim 
for P* with t > 1] the claim for A* with t > 1 follows similarly. 

A key point is to use the independent splitting property of the Poisson distribution to give an 
equivalent description of the numbers of children of each type for any vertex in the tree. Instead of 
separately generating the number of children of each type, we can first generate the total number 
of children and then independently and randomly select the type of each child. Por every vertex i 
in Tu, let W denote the total number of its children. If Tj = + then W ~ Pois((i+), and for each 
child j G di, independently of everything else, tj = + with probability pa/d^ and tj = — with 
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probability pb/d+, where = pa + pb. If Tj = — then Ni ~ Pois(d_), and for each child j G di, 
independently of everything else, Tj = + with probability pb/d- and tj = + with probability pc/d-, 
where d- = pb + pc. With this view, the observation of the total number of children W of vertex i 
gives some information, and then the conditionally independent messages from those children give 
additional information on Tj. Specifically, 
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where (a) holds because W and Tj for j G are independent conditional on r*; (6) follows because 
Ni ~ Pois((i+) if Tj = + and W ~ Pois(fi_) if Tj = —, and Tj is independent of Tj conditional on 
Tj; (c) follows from the definition of as Tj ~ 2 * Bern(/9a/d+) — 1 (resp. 2 * Beicn{pb/d-) — 1) 
conditional on Tj = + (resp. —); (d) follows from the definition of P*-. 


A.2 Proof of Lemma 3.7 

We will show that as n —>■ oo, Pq is bounded by V* from the below for any t > 1. Before that, 
we need a key lemma which shows that conditional on (G^,aQQt), Gu is almost independent of the 
graph structure outside of The proof is similar to that of [40, Proposition 4.2] which deals 
with the special case p = 1/2 and a = c. The key challenge here is that when p ^ ll2 oi a ^ c, 
the overall effect of the non-edges depends on a and some extra care has to be taken (see (23) for 
details). 

Lemma A.l. For t = t{n) such that 6* = there exists a sequenee of events £n such that 

P {£n} —^1 as n ^ oo, and on event £n, 

F [au = x\G*-,aQGt] = {1 +o{l))F{au = x\G,aQGt] , Vx G {±}. (21) 

Moreover, on event £n, {G^,aGt) = {T^Nf) holds. 

Proof. Recall that G^ is the subgraph of G induced by vertices whose distance from u is at most t. 
Let A denote the set of vertices in G^~^, denote the set of vertices in G^, and C* denote the set 
of vertices in G but not in G*. Then U dG^ = and U dG^ L) = V. Define sa = 
and SG = Lef 

T, = {(ucG*) : jscl < n°-6, |B| < n^-\ {G\aGt) = [T^TTt)}. 

By the assumption P = n°^^\ it follows that {G^,aGt) = (T^jTrpt) and |i?| = with probability 
converging to 1 (see [40, Proposition 4.2] for a proof). Note that sg = 2A — jCj for some X ~ 
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BinomdCI, p). Letting = n^'®, in view of the Bernstein inequality, 


P{|sc-(2p-l)|C|| > an] {\X - p\C\\ >an/2] < 2e"W2+^ = o(l), 

where the last equality holds because ICI < n and anj^fn —)• oo. In conclusion, we have that 
F {Sn} —)• 1 as n —)• oo. 

To prove that (21) holds, it suffices to show that on event Sn, 

F {au = x\G\aQGt] = {1 +o{l))F [au = x\G\aQGt,CFc} , Vx E {±}. (22) 

In particular, on event 

P{cr„ = x\G,aQGt} = ^P{cr„ = x,ac\G,asG^^} 

= ^P{crc|G, asG^^}^{(Xu = x\G,asG^^,(XG} 

= '^F{ac\G,aQGt}F{au = x\G\aQGt,crc} 

= (1 + o(l))P {au = x\G\ OQGt } , 

where the third equality holds, because conditional on {G^,aQGt,o'G), (Ju is independent of the 
graph structure outside of G*; the last equality follows due to (22). Hence, we are left to show the 
desired (22) holds. 

Recall that G = (R, E). For any two sets Ui,U 2 C V, define 

^(7i,t/2(G',o-) = (puviG,a), 

{u,v)&UixU2 

where {u, v) denotes an unordered pair of vertices and 
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Then the joint distribution of cr and G is given by 

F {a,G,a} = 2~'^^b,b ^c,c ‘hgcEC ^A,c- 


Notice that A and G are disconnected. We claim that on event £n, ^A,c only depend on gg through 
the o(l) term. In particular, on event 
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where the second equality holds because u € A and v G C implies that {u, v) ^ E and thus (f)uv 
is either 1 — a/n, 1 — c/n, or 1 — h/n, depending on au and the third equality holds because 
(|A| + IsaDIsc — (2/9 — 1)1^11 < 2an\B\ = o(n); the last equality holds for some K{aA, IC'D which 
only depends on a a and ICI. As a consequence, 

F{a,G,Sn} = (l + o(l))2- K{aA,,\C\) <!>b,b ^c,C ^dG\C • 


It follows that 


F{au = x,G\aQGt,ac,Sn} = il + oil))2-^ Y, K{aA,\G\) ^b,b 

<^A\{u} 


and thus 
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= (1 + o(l)) 2 “"- Y B:(aA, IC'D ‘^b,b Y l|h( 7 l<n 0 - 6 |- 
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By Bayes’ rule, 
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Hence, the desired (22) follows on event £n- 


□ 


Proof of Lemma 3.1. In view of the definition of Pq given in (3), 

Pg = 2~ 2^^ = +1^} - IP Wu = -IC} |. 

Consider estimating au based on G. For any t G N, suppose a genie reveals the labels of all vertices 
whose distance from u is precisely t, and let uoracie,* denote the optimal oracle estimator given by 


O'Oracle,t(w) - 2x '^\v\^c7u=+\G,<TgQt}>f‘{cTu=-\G,aQQt}} 

Let Pg( o’Oracle,*) denote the error probability of the oracle estimator, which is given by 

PG(o’oracle,*) = {<X u = +\G, agG^^} -Wu = -\G,aQGt} \ 

Since 3’oracie,t('a) is optimal with the extra information agGt, it follows that pc (o’Oracle,*) < P*g for 
all t and n. Lemma A.l implies that there exists a sequence of events £n such that F {£n —)• 1} and 
on event £n, 


P{cr„ = x\G,agGt] = (1 + o(l))P{fj„ = x\G\agGt} , Vx G {+}, 
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and (G ,(Tct) = It follows that 


PG(3^0racle,t) = \~\^ [|lP{o-« = +\G,aQGt] -^{cFu = -|G,(TaG‘} |l{£-„}] + o(l) 

~ 2 ~ 2^ [[iP {^u = +|G^ o-QGt} — P {au = —\G^, ctqg^} |1{£:„}] + o(l) 

= \~\^ [llP’l'Tn = +\T\TQTt} -P{r„ = -|^^ Tart} +0(1) 

= [llP’l'Tn = +|T*,raT‘} -F{tu = -|^^raT^} |] +o(l) 

= Pp +o(l). 


Hence, 


linisup(pG -PTt)> limsup(pG(S^Oracle,t) -PtO = 0- 
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A.3 Proof of Lemma 4.1 

We first prove the claims for . By the definition of and the change of measnre, we have 

E [g{Ti)\Tu = -] = E [giTi)e-^^^T^ = +] , 

where g is any measurable function such that the expectations above are well-defined. It follows 
that 


E [p(Zi)] = E g{Zl) 
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By conditioning the label of vertex u is —, it follows that 


E [Zi+I] = + 1 iog(6/c)d_ + 1^!!^ (p6E [f{ZX)] + -pd& [/(Zi)]) 

- (p6E [/2(Z^)] + pcE [/2(Zi)]) + O (6|e"^ - 1 \^) . 

In view of (24), we have that 

pbE [fiZD] + pcE [f{Zt)] = pbE [f{Zi){l + e-^^iipc)/{pb))] = pb, (26) 

pbE [fiZi)] + pcE [f{Zi))] = pbE\f(Zi)(l + e-^^+{pc)/{pb))] = pbE [/(Z^)] . (27) 


Hence, 


E [Zi+i] = + 1 iog(6/c)d_ + ^^^^^^pb - [f{Zi)] + O {b\e^^ - 1|- 
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As a consequence, 
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where the last equality holds due to {a — b)/y/h = p and (c — 6)/ y/b = v for fixed constants p and 
u. Moreover, 
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and 6|e^^ — 1|3 = 0(6 ^/2), Assembling the last four displayed equations gives that 
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It follows that 


E[zi+'] =-p^lv + 


{2p-l)v‘^ p{p + vf 
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I _|_ g-2(2^+</3)_ 




where in the last equality we used the fact that = |(tanh(x) + 1). Recall that A = 

and 6 = + (i- 2 p)t/ ^ Therefore, we get the desired equality: 

E = -0 - AE [tanh(Z^ + ip)] + 0{b-^^^). 

Next we calculate var(Zi'^^). For Y = where L is Poisson distributed, and {Xi} are 

i.i.d. with finite second moments, one can check that var(y) = E [L] E [^i] • In view of (II), 

var(Z!_+^) = pbE [F^{Z^+)] + pcE [F^{Zt)] , 

In view of (25) and the fact that e^^ — 1 = o(l), we have that 

F'^ix) = ^ log^ ^ ^ \ ~^) “ ^og{b/c))f{x) + O - 1|^) , 

Thus, 

(zi+^) = ^log^ ^ ~ [fi^-)]] 

- 1| 


var 


+ 4 " 


- 1) (1 - log(6/c)) [pbV. [/=(Zi)] + pc-e. lf(Z‘_)]] + O (6|e'‘'’ 
Applying (26) and (27), we get that 
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•(1 - log(6/c)) = p{p + i/)2 + 0 ( 1 ). 


Moreover, we have shown that b\e^P ~ Ip = 0{b ^/^). Assembling the last three displayed equations 
give that 


t+u _ + v) , p{p + z^)^ 
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E[/(Z^)] +0(5-i/2). 


26 























Finally, in view of (29), we get that 
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^ ~ 4 2 4 
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= 0 + AE [tanh(Z^ + ip)] + 


E [tanh(Z^ + p)] + 0 ( 6 -i/ 2 ) 


The claims for can be proved similarly as above. We provide another proof by exploiting 
the symmetry. In particular, note that our tree model is parameterized by (p, a, b, c) with labels + 
and —. Consider another parametrization {p',a',b\c') with labels +' and , where p' = p, a' = c, 
b' = b, c' = a, = +. Let Z(j_, and Z(_, denote the random variables corresponding to Z(j_ 

and Z(_, respectively. Then, one can check that Z(j_, has the same distribution as —Z^_ and Z^_i has 
the same distribution as —Z\. We have shown that 
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E [Z!+^] = 


p(p-zz)2 (2p-l)p2 p{p + vf 


8 4 

Applying g{x) = tanh(x — p) into (24), we get that 
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= 0 + AE [tanh(Z^ + p)] + 0{b-^/‘^). 
Finally, note that 


var 


(Z^+i) = var(Zi+^) = -E \z^V\ + 0(6“^/2) ^ ^ [Z^^^] + 0(6“^/2^. 


Combing the last two displayed equations completes the proof. 
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A.4 Proof of Lemma 4.2 


The following lemma is useful for proving the distributions of and Z^_ are approximately Gaus¬ 
sian. 


Lemma A.2. (Analog of Berry-Esseen inequality for Poisson sums [28, Theorem 3].) Let Sd = 
Ai-|- - • - -l-AAr^, where Xi : i > 1 are independent, identically distributed random variables with finite 
second moment, and for some d > 0, is a Pois(d) random variable independent of {Xi : i > 1). 
Then 

Cbe-^ [|Ai|3] 

where E [Sd] = dE [Xi], var{Sd) = dE [A^] , and Cbe = 0.3041. 

Proof of Lemma 4-2. We prove the lemma by induction over t. We first consider the base case. 
For Z^, the base case t = 0 trivially holds, because F^ = 0 and vq = 0. For VF*, we need to 
check the base case t = 1. Recall that = oo if = -|- and = —oo if T£ = —. Notice that 
F{oo) = ^log(a/6) and F{—oo) = ^log(6/c). Hence, 

M = + ( 31 ) 

i=l 


sup 


Sd-E 

JvariSd) 


<x\—E{Z<x} 


where conditional on = ±, Nd ~ Pois(ci±) is independent of {Xi}] {Xi} are i.i.d. such that con¬ 
ditional on Tu = +, Xi = ^ log(a/6) with probability {pa)/d+ and Aj = ^ log(6/c) with probability 
(p6)/d+; conditional on = —, A* = ^ log(a/6) with probability {pb)/d- and A* = ^ log(6/c) with 
probability {pc) / d-. Taylor expansion yields that 

log(a/6) = log (l + = ^ - fj + O 

log(6/c) = - log (l + = ^ + ^ + O {l>-'‘'f . 

Since F is monotone, 

E[Ai 2] > min{| log(a/5)|^ | log(Vc)|^} = P (^min | = FL{b~^) 

E [|Ai| 3] < max{| log(a/6)|3, | log(6/c)|3} = O ^ 0{b-^/^). 

Thus, in view of Lemma A.2, we get that 


sup 

X 


W^-E [Wl] 
^var(W^) 



E{Z <x} 


< 0{h-^/‘^). 


By conditioning the label of u is —, it follows from (31) that 

E [irl] = - [-d+ + d- + log{a/b)pb + log{b/c)pc] = ^ 0(6“^/^) = -wi + 0{b~^^‘^) 

var {Wf) = ^ log^(a/6)p6 -h ^ log^{b/c)pb = " -b 0(6“^/^) = wi + 0{b~^^^), 
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where we used the fact that wi = 9 + X = by definition. Similarly, by conditioning the 

label of M is +, it follows that 

IE [W|] = ^ [-d+ + d- + log{a/b)pa + log{b/c)pb] = ^ + 0(6“^/^) = tci + 0{b~^/‘^) 

var (Wl) = ^ log'^{a/b)pa + ^ log^(6/c)p6 = + 0{b~^^'^) = wi + 0{b~^^‘^). 


Hence, we get the desired equality: 


sup 
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wj. =F Wl 



'{z < x} 


< 0{b- 


l/2)_ 


In view of (10) and (11), A* and F* satisfy the same recursion. Moreover, by definition, vt and 
wt also satisfy the same recursion. Thus, to finish the proof of the lemma, it suffices to show that: 
suppose (15) holds for t, then it also holds for t + 1. We prove the claim for Z^^-, the claim for 
Z*^^ follows similarly. In view of the recursion given in (11), 


^t+i ^ ~d+ + d- 
2 

i=l 

where Nd ~ Pois((i) is independent of {Tj}; {Yi} are i.i.d. such that Yi = F{Z^) with probability 
pb/d- and Yi = F{Zt) with probability pc/d-. Since F is monotone, F(oo) = log(a/6), and 
F{—oo) = log(6/c), it follows that 

IE \Yi \ > min{| log(a/6)p, | log(6/c)p} = H(6“^) 

E [|yi|3] < max{| log(a/6)|3, | log(6/c)|3} = ©(r^/^). 

In view of Lemma A. 2, we get that 


Nd 


sup 
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zi+i - E 
^var(Zi+i) 



F{Z < x} 


0{b-^/^). 


It follows from Lemma 4.1 that 


(32) 


E [Zi+^] = -0 - AE [tanh(Z^ + ip)] + 0{b-^/‘^). 
var (Zi+i) = 0 + AE [tanh(Z^ + p)] + Oib-^/"^). 

Using the area rule of expectation, we have that 


E [tanh(Z^ + p)] 

= J tanh'(t)E {Z(j_ + (/) > t} dt — y tanh'(t)E + (^ < t} 

^1 pO 

= J tanh'(t)E {vt + y/vtZ + p >t}<Yt — j tanh'(t)E {vt + y/vtZ + y? < t} + 0(6“^^^) 
= E [tanh(ut + ^/v~tZ + p)] + 0(6“^/^). 


where the second equality follows from the induction hypothesis and the fact that |tanh^(t)| < 
1. Recall that vt+i = 9 + AE [tanh(ut + ^/vtZ + p)] . Hence, E = —vt+i + 0{b~^/‘^) and 

var (Zi"^^) = vt+i + 0{b~^/'^). As a consequence, in view of (32), the desired (15) holds for Z^^^. □ 
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A.5 Proof of Lemma 4.3 

By definition, 


h{v) = E [tanh (v + \/vZ + (/?)] . 


Since | tanh(x)| < 1, the continuity of h follows from the dominated convergence theorem. We next 
show h'(v) exists for v G (0, oo). Notice that tanh'(x + \fxZ + y?) = (1 — tanh^(x + ^JxZ + + 

x~^l‘^Zl2) for X G (0, oo), and 

I (l — tanh^(x + '/xZ + (^)) (1 + x~^^‘^Z/2)\ < 1 + x~^^‘^\Z\/2. 

Since \ Z\ is integrable, by the dominated convergence theorem, E [tanh'(x + \fxZ + </?)] exists and is 
continuous in x over (0, oo). Therefore, x —>■ E [tanh'(x + ^fxZ + (p)\ is integrable over x G (0, oo). 
It follows that 


hiv) = E 


tanh((^) + / tanh'(x + y/xZ + (/5)da 


= tanh((/?) + / E [tanh'(x + + V’)] dx, 

Jo 


where the second equality holds due to Fubini’s theorem. Hence, 


h'{v)=E (l — tanh^(ti + \/xZ+ (/?)) (1 + X ^/^Z/2) 


Using the integration by parts, we can get that 


E [(l — tanh^(x + \/vZ + ip)^ \pvZ'\ 

poo ^ 

= / (1 — tanh^(x + X + v?) 


V2 




= —V 


j (1 — tanh^(x + X + y?) ^ 


\/2 


1 dx 


'Ey 


= —x(l — tanh^(x + x + (p) 


I2v 


\/2 


EV 


H-oo 


/ oo 1 

(1 — tanh^(x + X + p)' . e~^^^^^dx 

-oo \J2'KV 


= —2xE [tanh(x + \pvZ + (/?)(! — tanh^(x + \pvZ + </?))] 


By combing the last two displayed equations, we get (17). 

Next, we prove the concavity of h in the special case with p = {). We will use the following 
equality coming from the change of measure: For fc G N, 


E 


tanh^^(\/xZ + v) 


= E 


tanh^^ ^{y/vZ + v) 


It follows from Lemma 4.3 that 


h'{v) = E [(l — tanh(x + y/vZ)^ (l — tanh^(x + y/vZ))] 

= E (l — tanh^(-v/xZ + x))^ 

= E (l — tanh^ (-v/u |Z + v^l))^ , 

where the last equality holds because tanh^(x) is even in x. For 0 < x < rx < oo and all z G 

tanh^ {y/w \ z + Vx|) ^ tanh^ (^y/v \ z + V^|) • 
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Thus 


h' {v) = E (l — tanh^ (\/u \Z + \/t’|))" 


> E 


(l — tanh^ [y/w \Z + \/u|))" 


(33) 


Let X = \Z + ^/v\ and Y = \Z + \/w\. Then for any a: > 0, 

E{X < x} = E{— X — y/v < Z < X — y/v] > E{— X — ^/w < Z < X — \^} = E{T < x} . 

Hence, X is first-order stochastically dominated by Y. Since (l — tanh^ {^/wx))‘^ is non-increasing 
in X for x > 0, it follows that 


E 


(l — tanh^ (^\/wX))‘ 


> E 


(l — tanh^ (^y/wY)y 


Thus by (33), 


h'(v) > E (l — tanh^ {Vw \Z -|- Vw\)) 


2 


= h'{w). 


A.6 Proof of Lemma 4.4 

The proof is very similar to the proof of Lemma 3.7; the key new challenge is that E [0{a, u)] does 
not directly reduce to the error probability of estimating au based on graph G. We need a key 
lemma. 

Lemma A.3. Fix any t > 1 and any two different vertices i and j. For estimator a{G) of a based 
on G, 


E 






< I 2 


+ 0 ( 1 ). 


(34) 


Proof. Fix any t > 1. Recall that denotes the subgraph induced by vertices whose distance from 
u is at most t and dG!^ denotes the set of vertices whose distance from u is precisely t. Let (Tj, ttJ 
and {Tj,TTj) denote two independent copies of the tree model with p = 1/2 and a = c defined in 
Definition 3.1. The coupling lemma given in Lemma 3.5 and Lemma A.l can be strengthened so 
that there exists a sequence of events Sn such that E {£n} 1 and on event £n, {Gj, a^t) = (T/, Trpt), 

= iTj,TTt), and 


' jcTj, ctjIG, cTQQt, agQt'^ = (1 + o(l))E ^ *^90*} 

E jcTjlG, <7gG‘| = (1 + o(l)) E |(Ti|G*, <7gG‘| 

E ^crj\G,agQt'^ = (1 + o(l))E | Uj | G*, ct^g* | • 


(35) 

(36) 

(37) 


For u = i,j, define 


Xu = Ejun = +\G,a9Gty] -E{(T„ = -|G,o-aG^.} 
Yu = E{r„ = +|r*,rari} -E{r„ = -|r^,raTi} . 
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Then for any estimator 5(G), we have that 


E 


1 / 2 ) (^l{5,^crj} 1/2^ 


= E 


= E 


1 / 2 ) 1/2^ |G*, CTgjj't, (Tgfjt !{£:„} +o(l) 


- 1 / 2 ) |G,agc.t 


E 




1 / 2 ) |G, 


'^ac* 


4£n} 


+ 0(1) 


1 . 


<-E [\X,\\X,\^s„}]+oil) 

= ^E [|yi||y,|i|^„}] +0(1) = jE[|y,||y,|] + o(i) 
= ^E [|y,|] E [|y,|] + 0(1) = ( 1/2 - + 0(1), 


where the first and fourth equality follows due to E{Tn} —)■ 1; the second equality holds due to 
(35); the first inequality holds due to the fact that P jui / x\G,aQQt'^ is maximized at x = — if 
Xj > 0 and at x = + if Xj < 0; the third inequality holds due to (36), (37), {G\,aQt) = {Tj^,Trpt), 
and (GpaQt) = (Tj,Trpt) ; the firth equality follows because {Ti,TTi) and (Tj,TTj) are independent; 
the last equality follows because = 1/2 — E /2 by definition. Hence we get the desired 
(34). □ 

Proof of Lemma 4-4- Fix any estimator 5(G). Notice that by definition of 0{a,a), 


Oi(7,d) = - - 


1 1 

ie\n] 


Therefore, 


E[G(u,5)] = --E 


1 1 _ 1 
/ V 9 


n 


iG\n\ 


> - -E 
“ 2 


2n 1/2 


1 ^ 1 


n 


*e n 


where we used the Cauchy-Schwartz in the last inequality. Furthermore, 

2i 


E 


iefnl 


1 


{'Si^CTi} 


1 n — 1^ 

= --1-E 

4n n 


1 


Lf+^o-i} 


+ 52 ^ 0 - 2 } 


where we used the symmetry among vertices. Applying Lemma A.3, we get that 

2 -\ 

/ 1 ^ 1 \ 

E 




n 


is n 


+o(l)- 


(38) 


Combining the last displayed equation with (38) and noticing that < 1/2, we get the desired 
equality E [0{a,a)] > p^t + o(l)- Since 5 is arbitrary, it follows that infjE [0(cr,5)] > pf^t + o(l) 
and the proof is complete. □ 
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A.7 Proof of Lemma 4.5 

Before the main proof, we need a key lemma, which gives a recursive formula of F* on the tree 
model. Its proof is almost identical to the proof of Lemma 3.4 and thus omitted. 

Lemma A.4. For t > 0, 


pt+i _ 


1 

2 


^log 

j&di 


exp a + b 

exp ( 2 ?b + a 


(39) 


with r° = i log ^ ifTi = + and r° = log ^ ifTi = -. 

Let V~^ = {i £ V (Ti = +} and V~ = {i £ V : (Tj = —}. For an Erdos-Renyi random 
graph with edge probability b/n, it is well-known that if 6 —)• 00 and b = o(logn), the maximum 
degree is at least iog(\og’(t/j)) with high probability (see [20, Appendix A] for a proof). Thus, with 

high probability, at least one vertex in U has more than 2 logpo^n/fe) iieighbors in V\U, so that n* 
is well-defined. Due to the symmetry between -|- and —, without loss of generality, assume that 
aut = +• By the assumption that |//| > 2 and b = it is proved in [36, Lemma 5.7] 

that there exists an a G (0,1/2) and a polynomial-time estimator such that for any u £ V\U, 
when we apply the estimator in Step 3.1 of Algorithm 2, its output satisfies |1F)/A1/+| < an and 
|1F-AR-| < an after relabeling defined in Step 3.2 of Algorithm 2. Recall that au is the fraction 
of vertices misclassified by the partition (W/*", W ~). Thus, < a. 

Fix a vertex u £ V\U. For all i £ let ai = + if i £ and ai = — if i £ W~ after the 
labeling dehned in Step 3.2 of Algorithm 2. It is argued in [36, Section 5.2] that for each i £ dGl^, 
independently at random, = ai with probability 1 — au-, and ai = —ai with probability au- 
Consider the tree model (T^, r) with p = ll2 and a = c, where for each vertex i £ Tu, independently 
at random, Ti = Ti with probability 1 — and Ti = —Ti with probability By the coupling lemma 
given in Lemma 3.5, we can construct a coupling such that {G\j^-,aQt ,aQQt ) = {Tl,TTt ,TQTt) holds 
with probability converging to 1. Moreover, on the event (G(j, = (^mj we have 

that = F(j in view of the definition of R\^ given in Algorithm 2, and the recursive formula of F(j 
given in Lemma A.4. Hence, 


Pg{^bv) = + 0 ( 1 ), 

where the o(l) term comes from the coupling error. Since q^t ^ is non-decreasing in a, it follows 
that 

Pg(?bp) < QT£a + o{l)- 
By the definition of 0{a, a) given in (6), 

E[0(ctbp,o-)] <pg(ctbp), 

and the lemma follows by combining the last two displayed equations. 

A.8 Proof of Lemma 4.6 

Recall that ut+i = [tanh(ut -|- y/utZ)] with ui = the case p = 1/2 and p. = v, 

2 

9 = 0 and A = and hence ut and vt satisfy the same recursion. Also, comparing (39) to (11), 
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r and r satisfy the same recursion with p = and p, = u. Therefore, in view of the proof of 
Lemma 4.2, to prove the lemma, it suffices to show that in the base case with t = 1, 


sup 

X 


< aX-¥{Z < x} 




(40) 


Recall that T? = ^ log if Tj = + and T^ = — ^log^!^ if Tj = —. Also, for all i G du, 
independently at random r, = r* with probability 1 — a, and Tj = with probability a. Let 
X* = ^logl^^l^^, d = ^ and p = Thus, in view of the recursion given in (39), 

r), = where conditional on = ±, ~ Pois(d) is independent of {Aij}; {X,i} are i.i.d. 

such that conditional on Tu = +, Xi = x* with probability 1 — p and Xi = —x* with probability p] 
conditional on = —, Aj = x* with probability p and Xi = —x* with probability 1 — p. Taylor 
expansion yields that 

^ (1 - 2a)(a - I,) + 

2b 

By conditioning the label of u is —, it follows that 


E 


^ -rid - 2^)^ ^ - ’>'1^- + o(rv^) 


= -ui + 0(6-i/2), 

= dx^ = + 0{b-^/^) = ui + 0{b-^/‘^). 

In view of Lemma A.2, we get that 


var 


sup 

X 


Zl-E 




4 < X 


var 


^4 


E{Z < x} 


< 0(6-1/^). 


Hence, we proved (40) for Zl. By symmetry between 


and +, the desired (40) also holds for Z\. 


B A Data-driven Choice of the Parameter ot in Algorithm 2 

Algorithm 2 requires the knowledge of which is given by = \ W^XV~^\/n = \ W~AV~\/n. 
In this section, we show that there exists an efficient estimator such that + o(l) with 

high probability. Our estimation procedure is given in Algorithm 3. 

Lemma B.l. Let ctu be the output of Algorithm 3. Then with probability converging to 1, ctu = 
Uu + o(l). 

Proof. We assume a > 6 in the proof; the case a < b can be proved similarly. Let k* = 2 iogOo^n/6) • 
For any vertex i in T, let di denote its number of neighbors in S. Then di is stochastically lower 
bounded by Binom(|S'|, 6/n). Since |5| = [n/log6J, it follows that (see [20, Appendix A] for a 
proof) 

— log E {di > k*} < ^ log n — log log n. 
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Algorithm 3 Estimation of au 

1: Take U C E to be a random subset of size and 5 C E to be a random subset of size 

[n/ logftj. Let u* G [/ be a random vertex in U with at least 2 iog'(°o^n/fe) J^eighbors in V\U\S. 

2: For u G V\U\S do 

1. Run a polynomial-time estimator capable of correlated recovery on the subgraph induced 
by vertices not in and UVJS, and let and W~ denote the output of the partition. 

2. Relabel lE^ and W~ such that if a > 6, then n* has more neighbors in lE^ than W ~; 
otherwise, tt* has more neighbors in W~ than 1E+. 

3. Take T C lE^ U lE^ to be a random subset of size Y^/n\. Let Tq C T denote the set of 
vertices with at least 2 logQo^n/fe) iieighbors in S. Let Ti denote a random subset of Tq 

with size . 

4. Run a polynomial-time estimator capable of correlated recovery on the subgraph induced 
by vertices not in and U VJT. Let IE"'' and 1E“ denote the output of the partition. 
Relabel (1E+, 1E“) in the same way as (1E+, W~). 


5. 


Let consists of vertices i G Ti with more neighbors in IE"’' than IE ; let 


Define 


\T+nWu\+\T-nw+\ 

\Ti\ 


Ti\T+. 


Because {di}i^T are independent, the cardinality of set Tq is stochastically lower bounded by 
Binom( [\/n \, log nj^/n). Therefore, with high probability |To| > | logn and thus Ti is well-defined. 

Define the event Ei to be that = Ti n E"*" and = Ti n V~. We claim that P {Ei} —)• 1. 
In fact, fix any vertex f G Ti, suppose a* = + without loss of generality and let Af(i) denote the 
set of its neighbors in E\T. Let 

|lE+AE+|/n < Sn, |1E-AE-| < Sn. 

Then S G [0,1/2). Notice that di is independent of the partition IE"'' and 1E“. Thus, conditional 
on di, |AA(i) n IE'''| is stochastically lower bounded by Binom((ij, and |AA(z) n 1E“| is 

stochastically upper bounded by Binom((ij, It follows from the Chernoff bound that 

conditional on di, 

p||AA(f) n lE+l < 

V '2(a + 6)i- 

p||AA(f)nIE-| > 

I' 2(a + 6)/- 

Due to di > k* for all i G Tq, it yields that 

P{|AA(f)nIE+| < |AA(i)nIE-|} < 

Applying the union bound, we get that 

P{3i GrinE+,|AA(i)nIE+| < |AA(i)nIE-|} < 
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By the assumption that a — hj^fa = 0(1) and h = o(logn), it follows that 


(-n ( ‘y" = „(1), 

b b \ \olog(logn/o) / / 

Combing the last two displayed equations gives that with high probability, for all i £ Ti (1 V^, 
\Af{i) n VF+I > |AA(i) n W~\ and thus i G . Similarly, one can show that with high probability, 
for all i G Ti n V~, \Af{i) n W^\ < |AA(i) n W~\ and thus i £ . Hence, P{Hi} —^ 1. 

Finally, we show otu = oiu + o(l) with high probability. Let Wu = 1 F+ U W~. Notice that 
Ti is randomly chosen and independent of the partition (W^,W~). Hence for i G Ti, it lies in 
{W~ n F+) U ( 1 F+ n V~) with probability au- Therefore, 

\Ti n F+ n W-| + \Ti n F- n 1F+| ~ Binom(|ri|,a„). 

Define the event E 2 to be 

Then it follows from the Chernoff bounds that P{£' 2 } 1- By the union bound, P{i?i H £^ 2 } —^ 1- 

Notice that on the event £1 n £ 2 , 

^ |T.nWn.r-K|r.nr-n»-„^| ^^^ 

Therefore, we conclude that au = otu + o(l) with high probability. □ 
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